﻿ Tehnicide IngineriaLimbajuluiNatural Curs 11 The pragmatic level Cognition triggered by affordances, conceptualisationsand language formation Curs: Dan Cristea Laboratoare: Diana Trandabăț, Mihaela Onofrei, Daniela Gîfu, IonuțPistol The pragmatics layer INITIAL textSYNTACTIC SUB-SYNTACTIC PROCESSINGPROCESSINGPROCESSING SEMANTIC DISCOURSE PRAGMATIC PROCESSINGPROCESSINGresultPROCESSING VOICE- CONTROLLED ROBOTS COGNITION IN CONNECTION WITH LANGUAGE ACQUISITION OF LANGUAGE SENTIMENT & HUMOUR FAKE NEWS, AUTO-TRUST Cognition: integration of language, vision and action (with information and images from K Pastra and Y Aloimonos) Spectrum of researches and systems •Go back to AI and robotics –SHRDLU system verbalized visual changes in a 2D blocks scene, –medium translation systems (e g automatic sports commentators), –multimedia presentation systems (e g automatic creation of illustrated technical manuals), –robots controlled through voice POETICON++ (2012-2016) •Advocates that language is necessary for robots to generalise behaviours and perceptions –puts forward a concrete methodology for developing cognitive computational mechanism •There is increasingly growing evidence that language is inherently connected to action and perception –computational and engineering research in cognitive robotics POETICON++ (2012-2016) •Cognitive robotics research: geared towards closing the loop between robot sensing, robot acting and robot learning –language kept as a communication interface with humans –language (through its hierarchical and compositional structure) can play a dynamic role and should be included in the loop •POETICON++ showed why language is significant for the next generations of robots iCub –a robot which listen and acts iCub –the robot trained in the Poeticon++ project Language in correlation with visual abilities, motor capacities and experience about world Associating objects with words •Machine learning methods: the learning module helps in developing resources on its own through supervised learning: –human feeds the robot with a word which corresponds to the image of an object in view; –the robot generates a simplified representation of the image of this object as feature-value vectors (colour and shape attributes); –once a number of associations are learned, the agent compares representations of the image of any new object with the ones it knows; –using e g a nearest neighbour algorithmthe system associates the new object with the name of the known object it is most similar to Grounding resources: PRAXICON •A resource that links natural language and sensorimotor representations of concepts, with the aim of facilitating multimodal content integration in cognitive systems –going bottom-upin the resource (from sensorimotor representations to concepts) one will get a hierarchical composition of human behaviour, –going top-down(from concepts to sensorimotor representations) one will get intentionality-laden interpretations of those structures Katerina Pastra (2008): PRAXICON: The Development of a Grounding Resource, Proceedings of the international workshop on human-computer conversation, Bellagio, Italy Visual and language cognition: the team of prof Aloimonos The robot hand performs an action: understanding sequences of images The robot hand performs an action: understanding sequences of images The robot hand performs an action: understanding sequences of images Decomposing actions •Three main ‘morpho-syntactic’ features which characterize human actions and can be employed for defining action terminals and non-terminals: –tool complement(tc): the effector of a movement, a body part, a combination of body parts or the extension of a body part with a graspable object used as a tool; –object complement(oc): any object affected by a tool- use action; –goal(g): the final purpose of an action sequence of any length or complexity Decomposing actions K Pastra and Y Aloimonos (2012) The minimalist grammar of action Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1585):103–117 From semantics to pragmatics semantics A salad can be tossed spoon is a toolvalid affordancesA A carrot can be cut Eat soup with the spoonpragmatics Cut with spoon the carrot!refused affordances Semantic reasoning based on affordances •Suppose an object is identified as a rake (by vision-based classification) èuse knowledge stored in a language-based semantic network (e g WordNet) èit can be chosen as a good candidate for bringing other objects closer •Difficult to express symbolic rules for the huge variety of objects that can be used as tools Learn affordances by examples •Complement the linguistic knowledge with a probabilistic model of affordances learned by the robot through self-exploration –robot performs numerous experiments on a set of objects displaced on a table; –one object is selected for grasping (intermediate object) and another object is selected to be acted on; –no concept of “tool” still exists: objects and actions are randomly selected during the learning process and a causal model of the occurrences is obtained in the form of a Bayesian Network Visual descriptors help to learn affordances •2D visible silhouette: –segment the object in connected components of pixels (“blobs”) –describe blobs by basic sets of visual features •contour perimeter, area, polygonal approximation, convex hull, approximating ellipse, minimum-enclosing circle and minimum-enclosing rectangle –use these shape primitives to assess similarity between objects even if they belong to different conceptual classes, because they are very general and do not demand for a categorization of the object •e g : toothpicks and straws might be categorically different but, due to their similar shapes, both might afford to stir the coffeeVisual descriptors help to learn affordances •Measure the success of different triplets: , in which the tooland the objectare feature-based characterized, and the action is localized in an ontology : Get-closer with the rake the ball! èSUCCESS : Stir with the straw the coffee! èSUCCESS : Get-closer with the straw the ball! èFAIL … •Associate names to tools, actions and objects •Fill in missing tools or objects : Stir the coffee! èX=straw, toothpick, tea spoon Words help associate affordances •When visual and word stimuli were presented concurrently: –iCub associated a label with the features characterizing an object and the learned affordances –after learning, when only a spoken input was presented for an action and an object, the affordances were remembered and the correct tool was selected Finally, iCub decides by itself how to solve a command Conceptualization in Language An Evolutionist View This course is based on a talk given in COST A31 project meeting, Budapest, May 2008 Some pictures belong to Luc Steels, SONY Laboratories, Paris The ALEAR project (2008-2010) •How concepts aroused in humans and how have they been expressed in language? –Why do we speak the way we do? –Why are there so many languages although most of us operate with the same concepts? –Is there a way to prove scientifically hypotheses about the evolution of languages? The “Talking Heads”experiment •Goal: how the language evolved? –In a community, over time: language emerges through self organisation –In individuals, meaning is built in a cumulative growth process ALEAR –Main Objective “Carefully controlled experiments in which autonomous humanoid robots self-organise rich conceptual frameworks and communication systems with similar features as those found in human languages ” Synthesis of intelligence approaches •knowledge-based or symbolic: operationalizing models from logic, generative linguistics and cognitive psychology •machine learning: copy intelligence by learning •behaviour-based: put the study of AI on biological grounds (see also earlier cybernetics) èneural(including deep): copy the physical realization of the brain, make variations and study the effects Approaches in studying the evolution of language •Whole systems approach: physical embodiment, sensory-motor, perception, conceptualisation, language •Self-generated: as opposed to designed or acquired through inductive machine learning •Multi-agent: as opposed to stand-alone •Evolutionary: start from scratch and see how a communication system forms and further develops Setting •Cognitive agents: –Physical aspects: •body •sensors •articulators •physical location •objects and agents located in the environment –Mental properties •behaviour •memory •lexicon •grammar •etc •The two aspects are separated: a real agent exists only when a virtual agent is loaded in a physical robot body The robots •Physically embodied autonomous agents –Motor-sensory processing •perception, •movements, •actions –Conceptual processing •recognise objects, •learn a lexicon, •build representations of concepts •towards the development of grammar Teleporting •Develop categories in one location and enrich his learning experience by moving to another location •The transmission of language from one generation to the next •Intercultural exchange and language contacts by migrating mental bodies in different parts of the world The “guessing game” •Two physically instantiated agents: speaker and hearer •Why game? –Because neither agent can look into the mind of the other They only interact through the external environment •What triggers the game behaviour? –There is an innate motivation programmed: agents try to maximise their communicative success this comes from the necessity to survive Interactive games •Rules: –agents can interact only conforming to stimuli coming from the external environment –they are bound to maximise their communicative success (programmed) •Acquisitions –guessing games: develop the vocabulary and abstract concepts by saying/guessing/pointing –develop space and time conceptualisations: events –develop rudiments of syntax The protocol of a game -One agent (speaker) chooses an object (focus), conceptualises it and, without pointing to it, emits a string description conforming to his lexicon -The second agent (hearer) parses that lexical description, matches it against his own conceptualisations and points to one object he believes is the focus object -If match, the game is successful èboth agents increase their confidence in the mapping conceptual space-lexicon for the words they used -If the game fails èthey decrease their confidence and the hearer learns another connection between the real focus and the description string COST A31 –WG1, Budapest, May 2008 Perception and categorisation •Sensory channels –software processes interpreting specific real world information: •HPOS (horizontal position) •VPOS (vertical position) •GRAY (gray level) •others –domain: 0-10 (continuous) discretised to a number of discrete values ècategorisation –categorisation could be specific to individuals The categories trees •Each individual can develop his own tree of categories for each sensory channel •Example for HPOS: agent Aagent B leftextreme left left middle left right rightmiddle right extreme right Categorisation in individuals Important notice: the symbols leftand This objectmiddle leftare our conventions to notate the HPOS property values –they do not belong to the acquired will be interpreted as…lexicon! leftmiddle left by agent Aby agent B Sensori-motorConceptual Form About perception again: why do we use some features and not others? •Salience: the property of one feature to 1distinguish the topic in 23the context: –the minimum distance HPOSbetween the topic’s value VPOSfor that feature and all HIGHTthe other objects’values DTH for that featureWI GRAY AREA About perception again: why do we use some features and not others? 1 2 3 objHPOSVPOSHEIGHTWIDTHGRAYAREA 10 250 450 300 660 450 70 scaling:20 200 320 400 500 900 74After 30 420 310 500 300 420 76 sal0 050 010 100 160 450 02 Lexicalisation –associating meanings to words Game 125 A segments the scene in 2 objects A categorizes the topic as HPOS[right] A says: “mo” B does not know “mo” B says: “mo?” A points to the topic B categorizes the topic as HPOS[right] B stores “mo”as HPOS[right] Lexicalisation –interpretation Game 205 A segments the scene in 2 objects A categorizes the topic as HPOS[right] A says: “mo” B interprets “mo”as HPOS[right] B points to the topic A says “OK” Lexicalisation –synonymy Game 245 A segments the scene in 2 objects A categorizes the topic as HPOS[right] A says: “mo” B does not know “mo” B says: “mo?” A points to the topic B categorizes the topic as HPOS[right] B has a word for HPOS[right]: “mogash” B stores “mo”as a synonym for “mogash” Differences in conceptualization produce “subtle”social polysemy Game 280 A segments the scene in 2 objects A categorizes the topic as HPOS[right] A says: “mo” B does not know “mo” B says: “mo?” A points to the topic B categorizes the topic as HPOS[middle right] •A has a two-values B stores “mo”as HPOS[middle of HPOSright]conceptualization •B has a four-values conceptualization of HPOSSubtle social differences in meaning can give rise to generalisations Game 302 A segments the scene in 2 objects A categorizes the topic as HPOS[right] A says: “mo” B knows “mo”as HPOS[middle right] B does not recognize an object in the scene having this value B says: “mo?” A points to the topic B categorizes the topic as HPOS[extreme right] B stores “mo”as HPOS[extreme right] Now B knows “mo”as both HPOS[middle right] and HPOS[extreme right] By repetition he can infer a new category which subsumes both HPOS[middle right] and HPOS[extreme right], which should be HPOS[right], and this will be called “mo” Ambiguity-1 Game 325 A segments the scene in 2 objects A categorizes the topic as HPOS[right] A says: “mo” B does not know “mo” B says: “mo?” A points to the topic B categorizes the topic as HPOS[right] and VPOS[low] and GRAY[light] B stores “mo”as HPOS[right] OR VPOS[low] OR GRAY[light] However, by positive feedback the lexicon will converge towards an efficient usage is not known a priory whether “mo”will Recovering from Itbe stabilized by B as only HPOS[right] or ly VPOS[low] or the union of the two ambiguity on An example: Game 340 A segments the scene in 2 objects A categorizes the topic as HPOS[right] A says: “mo” B knows “mo”as HPOS[right] OR VPOS[low] OR GRAY[light] B recognizes an object in the scene with the value HPOS[right] and no object with the value VPOS[low] or GRAY[light] B points to the topic A says “OK” B diminishes the meaning of “mo”as VPOS[low] and GRAY[light] and augments its meaning as HPOS[right] Suppose Game 325 takes place as llowing, instead: Ambiguity-2 fo Game 325’ A segments the scene in 2 objects A categorizes the topic as HPOS[right] A says: “mo” B does not know “mo” B says: “mo?” A points to the topic B categorizes the topic as VPOS[low] B stores “mo”as VPOS[low]] At this moment A and B understand different concepts by “mo” ter Game 325’A knows “mo”as meaning Ambiguity AfHPOS[right] and B acquired it as S[low] maintainedVPO Then we have this: Game 390 A segments the scene in 2 objects A categorizes the topic as HPOS[right] A says: “mo” B knows “mo”as VPOS[low] B recognizes an object in the scene with the value VPOS[low] B points to the topic A says “OK” Now and again, the two agents do not realize that they give different meanings to “mo” What is proved? •A lexicon in a single agent –new words are invented or adopted –scores of association between forms (words) and meanings (concepts) go up and down –a “virgin”, “newly born”agent will catch up with a lexicon already existent in a population •A common lexicon that stabilizes in a group of agents –however, differences could still coexist –the lexicon is sensible to the grows or reduction of population –the lexicon can absorb some shocks of contacts with other groups or can be destabilized What the plots show? •Ontology size as compared to communication success •Communication success dips each time the ontology is enlarged with new concepts, since new words have to be invented to deal with them •However, the agents clearly manage to become again successful in guessing communication success ontology What the plots show? •Increasing the population size •The agents create a word without knowing that one word already exist somewhere in the population (as it takes time to propagate) èthe risk of synonymy increases •However, a steady progress towards an effective communication system is noticed What the plots show? •What happens when two populations interact? •There is an initial destabilisation period when the coherence is low as the ambiguity increases •However, the new community catches up and a new common lexicon emerges, abundant in synonyms How to prove the origins of language? •In a simplified form: language = lexicon + grammar –The lexicongives names for concepts and objects: the guessing games –The grammar expresses relations between concepts and objects: how to put it in terms of interactions between agents? Guessing games implicit assumptions •Experiments are made in a controlled setting that simplifies many aspects: –The world is simplified to the content of the table (scene) –The agents have the attention focused towards the scene –Their “aim”is to identify objects (they are motivated) –The agents dispose of a set of channels which are sufficient to put in evidence identifying properties of the objects populating the scene –The maximum vocabulary sufficient to cover the concepts, as values produced by channels, is strictly limited –The words used by the agents apply to property values and not to objects •This setting is assumed (given, programmed) Establishing settings for exercising the birth of a grammar •Setting 1: –The world: the content of the table (scene) –Attention: focused towards the scene –Motivation: identifying objects –Channels: identify properties of objects (enriched) –Maximum vocabulary: strictly limited –The agents already share a background vocabulary for naming properties of objects (as in phase 1), not directly objects –Only one property is not enough for disambiguation è necessity to use combinations of words to express conjunctions –Implicit supposition: combinations express conjunction and not disjunction Putting words together New channels: COLOR: black, red, blue SHAPE: circle, triangle 2 1Important: the above symbols are our conventions to notate the 34mentioned property values –they do not belong to the acquired lexicon! Putting words Game 1014 segments the scene in 4 objects together AA categorizes the topic as {VPOS[low], SHAPE[circle], COLOR[red]} A has the lexicon: “bagadiru”for VPOS[low] “gugeawa”for SHAPE[circle] “camende”COLOR[red] A correctly identifies on the decision tree that VPOS[low] AND SHAPE[circle] are sufficient to identify the topic A says: “bagadiru gugeawa”2 1B has the lexicon: “bagadiru”for VPOS[low] “camende”for COLOR[red] B says: “gugeawa?” A points to the topic 4B categorizes the topic as {VPOS[low], SHAPE[circle], COLOR[red]}3 B correctly discovers on the decision tree that either SHAPE[circle] or COLOR[red], in combination with VPOS[low] are sufficient to identify the topic B stores “gugeawa”as both SHAPE[circle], and COLOR[red], with a confidence = 0 5 Conclusions of a set of experiments of this kind •Will not give rise to a grammar of expressing conjunctions: the implicit assumption was that putting words together restricts the selection (in conformity with most modern languages) •The order of words is not important red circularor circular red One step further in building a grammar •Setting 2: –The world: the content of the table (scene) –Attention: focused towards the scene –Motivation for: identifying spatialrelations among objects –Channels: identify properties of objects BUT ALSO spatial relations among objects –Maximum vocabulary: strictly limited –The agents have as background a common vocabulary for naming properties of objects (as in phase 1), not directly objects –Implicit supposition: in a linear expression Obj1R Obj2, the focus is Obj1 New channels expressing spatial relations •HREL: left-of, right-of •VREL: above, below2 1 obj1left-of obj2 (0 9) obj1left-of obj3 (0 4)34 obj1left-of obj4 (0 6) obj3left-of obj2 (0 1) obj3left-of obj4 (0 9) … But grammar is all about form “HREL[right-of(obj2 )]”expresses the concept “whatever is to the right of obj2“ One way to say that in this relation is obj1: “obj1 lexical-item-for(right-of) obj2” But, the agent knows a term for “right”: “mo” Then, he might combine this term with a new word expressing the concept of “relation”, for instance “ga”: “mo-ga” Expressing relations between objects Initial lexicon: “mo”= HPOS[right] “bagadiru”= VPOS[low]24 “gugeawa”= SHAPE[circle]5 “zamira”= SHAPE[triangle] “camende”= COLOR[red] 3 “gamaru”= COLOR[gray]1 Derived expressions: “zamira mo-ga gugeawa” Expressing relations between objects Initial lexicon: “mo”= HPOS[right] “bagadiru”= VPOS[low]24 “gugeawa”= SHAPE[circle] 5 “zamira”= SHAPE[triangle] “camende”= COLOR[red] “gamaru”= COLOR[gray]13 Derived expressions: “camende gugeawa mo-ga bagadiru gamaru gugeawa” Guessing me 2020 relations Ga A segments the scene in 5 objects A categorizes the topic as {VPOS[low], HPOS[right], SHAPE[circle], COLOR[red],HREL[right- of(SHAPE[circle])], HREL[right-of(COLOR[gray])], …} Both A and B have the lexicon: “mo”= HPOS[right] “bagadiru”= VPOS[low] “gugeawa”= SHAPE[circle]2 “zamira”= SHAPE[triangle]4 “camende”= COLOR[red] “gamaru”= COLOR[gray] A correctly identifies on the decision tree that HREL[right-5 of(obj1)] is sufficient to identify the topic A identifies obj3as {SHAPE[circle], VPOS[low], COLOR[red]} A says: “camende gugeawa mo-ga bagadiru gamaru gugeawa” B says: “mo-ga?”13 A points to the topic B identifies “camende gugeawa”as either obj2or obj3 but, based on A’s pointing, eliminates obj2 B identifies “bagadiru gamaru gugeawa”as obj1 B stores “mo-ga”as HREL[right-of()] How to diminish the amount of initial assumptions? •Remember our Setting 2: –… –Implicit supposition: in a linear expression Obj1R Obj2, the focus is Obj1 •This is a direct and artificial immixture in the very heart of the birth process of the language! •Solution: parameterize all grammatical features of the language and let them evolve naturally •Word order and prepositions John gave Mary a book in the library •Affixes Das Mädchen gibt denschwerenKoffer The Girl gives the heavy suitcase ihresBrudersdenFreunden (of) her brother (to) the friend •Inflection BrutusMarcello librumdedit Brutus Marcellus book gave Brutus gave a book to Marcellus[Source: palmer, p 8] •Particles Tanaka-san wa Tokyo de o-to-san ni atta Tanaka TOPIC Tokyo (in) father (loc) meet Tanaka met his father in Tokyo from Luc Steels Examples of semantic roles: Agent:the instigator of the event Counter-agent:the force or resistance against which the action is carried out Object:the entity that moves or changes or whose position or existence is in consideration Result:the entity that comes into existence as a result of the action Instrument:the stimulus or immediate physical cause of the event Source:the place from which something moves Goal:the place to which something moves Experiencer:the entity which receives or accepts or experiences or undergoes the effect of an action Source: Fillmore from Luc Steels Fluid Construction Grammar •Structures to represent the information needed in language processing about a specific sentence (feature structures) •Structures to represent the lexical and grammatical constructions (rules) •Operations of Unify and Merge •Structures specifying how new rules are built (templates) a chemical metaphor from Luc SteelsSensori-motorConceptualFormGrammatical Constructions •Semantic and syntactic categories do not operate in isolation •They are part of frames (schemas, patterns) •Constructions are mappings from semantic to syntactic schema •Constructions can also add additional meaning to meaning of the parts and add additional form Fillmore, Kay, Michaelis, Croft, Goldberg, … from Luc Steels Conclusion •The language is considered from an evolutionistic point of view •Lexicon formation: proved experimentally •Grammar and dialogue: was in the study in ALEAR –Perhaps Fluid Construction Grammar •integrates syntax and semantics (Fillmore’s roles) •unification and merge mechanisms